智能论文笔记

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Jonathan N. Lee , George Tucker , Ofir Nachum , Bo Dai , Emma Brunskill

分类：机器学习 | 人工智能

2022-11-03

In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good policy without interacting with the environment. A major challenge in applying such methods in practice is the lack of both theoretically principled and practical tools for model selection and evaluation. To address this, we study the problem of model selection in offline RL with value function approximation. The learner is given a nested sequence of model classes to minimize squared Bellman error and must select among these to achieve a balance between approximation and estimation error of the classes. We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors. The algorithm, ModBE, takes as input a collection of candidate model classes and a generic base offline RL algorithm. By successively eliminating model classes using a novel one-sided generalization test, ModBE returns a policy with regret scaling with the complexity of the minimally complete model class. In addition to its theoretical guarantees, it is conceptually simple and computationally efficient, amounting to solving a series of square loss regression problems and then comparing relative square loss between classes. We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.

translated by 谷歌翻译

Model Selection in Batch Policy Optimization

Jonathan N. Lee , George Tucker , Ofir Nachum , Bo Dai

分类：机器学习 | (统计)机器学习

2021-12-23

我们研究了批量策略优化中模型选择的问题：给定固定的部分反馈数据集和$ M $ Model类，学习具有与最佳模型类的策略具有竞争力的性能的策略。通过识别任何模型选择算法应最佳地折衷的错误，以线性模型类在与线性模型类中的内容匪徒设置中的问题正式化。（1）近似误差，（2）统计复杂性，（3 ）覆盖范围。前两个来源是在监督学习的模型选择中常见的，在最佳的交易中，这些属性得到了很好的研究。相比之下，第三个源是批量策略优化的唯一，并且是由于设置所固有的数据集移位。首先表明，没有批处理策略优化算法可以同时实现所有三个的保证，展示批量策略优化的困难之间的显着对比，以及监督学习中的积极结果。尽管存在这种负面结果，但我们表明，在三个错误源中的任何一个都可以实现实现剩下的两个近乎oracle不平等的算法。我们通过实验结论，证明了这些算法的功效。

translated by 谷歌翻译

NoFADE: Analyzing Diminishing Returns on CO2 Investment

Andre Fu , Justin Tran , Andy Xie , Jonathan Spraggett , Elisa Ding , Chang-Won Lee , Kanav Singla , Mahdi S. Hosseini , Konstantinos N. Plataniotis

分类：计算机视觉 | 机器学习

2021-11-28

气候变化仍然是一个迫在眉睫的问题，目前影响社会大。重要的是，我们作为一个社会，包括计算机愿景（CV）社区采取措施限制对环境的影响。在本文中，我们（a）分析了CV方法递减递减的效果，（b）提出了一种\ entyit {'nofade''}：一种基于新的基于熵的度量来量化模型 - 数据集 - 复杂性关系。我们表明一些简历的任务正在达到饱和度，而其他CV任务几乎完全饱和。在这种光中，Nofade允许CV社区在类似的基础上比较模型和数据集，建立不良平台。

translated by 谷歌翻译

Domain Adaptation Principal Component Analysis: base linear method for learning with out-of-distribution data

Evgeny M Mirkes , Jonathan Bac , Aziz Fouché , Sergey V. Stasenko , Andrei Zinovyev , Alexander N. Gorban

分类：机器学习

2022-08-28

域适应性是现代机器学习中的一种流行范式，旨在解决培训或验证数据集之间具有用于学习和测试分类器（源域）和潜在的大型未标记数据集的培训或验证数据集之间的分歧问题，其中利用了模型（目标域）（目标域）（目标域）。任务是找到源数据集的源和目标数据集的这种常见表示，其中源数据集提供了培训的信息，因此可以最大程度地减少来源和目标之间的差异。目前，最流行的领域适应性解决方案是基于训练神经网络，这些神经网络结合了分类和对抗性学习模块，这些模块是饥饿的，通常很难训练。我们提出了一种称为域适应性主成分分析（DAPCA）的方法，该方法发现线性减少的数据表示有助于解决域适应任务。 DAPCA基于数据点对之间引入正权重，并概括了主成分分析的监督扩展。 DAPCA代表一种迭代算法，因此在每次迭代中都解决了一个简单的二次优化问题。保证算法的收敛性，并且在实践中的迭代次数很少。我们验证了先前提出的用于解决域适应任务的基准的建议算法，还显示了在生物医学应用中对单细胞法数据集进行分析中使用DAPCA的好处。总体而言，考虑到源域和目标域之间可能的差异，DAPCA可以作为许多机器学习应用程序中有用的预处理步骤。

translated by 谷歌翻译

Effort Informed Roadmaps (EIRM*): Efficient Asymptotically Optimal Multiquery Planning by Actively Reusing Validation Effort

Valentin N. Hartmann , Marlin P. Strub , Marc Toussaint , Jonathan D. Gammell

分类：机器人

2022-05-17

多样性规划算法在单个搜索空间中找到各种不同的起点和目标之间的路径。它们旨在通过在计划查询中重复使用信息来有效地做到这一点。可以在搜索之前或期间计算此信息，并且通常包括有效路径的知识。使用已知的有效途径来解决单个计划查询要比找到全新的解决方案所花费的时间更少。这允许多算法（例如PRM*）在许多问题上胜过诸如RRT*之类的单个算法，但它们的相对性能取决于重复使用的信息。尽管如此，很少有多Qualery计划者明确地寻求最大程度地提高路径重复使用，因此，许多计划者并没有始终如一地超越单寻球替代方案。本文介绍了努力的通知路线图（EIRM*），这是一种几乎渐近的最佳多样性计划算法，明确优先考虑重复使用计算工作。 Eirm*使用非对称双向搜索来识别可能有助于解决单个计划查询的现有路径，然后使用此信息来订购其搜索并减少计算工作。这使其可以在经过测试的抽象和机器人多样性计划问题上的最新计划算法找到最高级别的初始解决方案。

translated by 谷歌翻译

CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation

Reuben Dorent , Aaron Kujawa , Marina Ivory , Spyridon Bakas , Nicola Rieke , Samuel Joutard , Ben Glocker , Jorge Cardoso , Marc Modat , Kayhan Batmanghelich

分类：计算机视觉

2022-01-08

域适应（DA）最近在医学影像社区提出了强烈的兴趣。虽然已经提出了大量DA技术进行了用于图像分割，但大多数这些技术已经在私有数据集或小公共可用数据集上验证。此外，这些数据集主要解决了单级问题。为了解决这些限制，与第24届医学图像计算和计算机辅助干预（Miccai 2021）结合第24届国际会议组织交叉模态域适应（Crossmoda）挑战。 Crossmoda是无监督跨型号DA的第一个大型和多级基准。挑战的目标是分割参与前庭施瓦新瘤（VS）的后续和治疗规划的两个关键脑结构：VS和Cochleas。目前，使用对比度增强的T1（CET1）MRI进行VS患者的诊断和监测。然而，使用诸如高分辨率T2（HRT2）MRI的非对比度序列越来越感兴趣。因此，我们创建了一个无人监督的跨模型分段基准。训练集提供注释CET1（n = 105）和未配对的非注释的HRT2（n = 105）。目的是在测试集中提供的HRT2上自动对HRT2进行单侧VS和双侧耳蜗分割（n = 137）。共有16支球队提交了评估阶段的算法。顶级履行团队达成的表现水平非常高（最佳中位数骰子 - vs：88.4％; Cochleas：85.7％）并接近完全监督（中位数骰子 - vs：92.5％;耳蜗：87.7％）。所有顶级执行方法都使用图像到图像转换方法将源域图像转换为伪目标域图像。然后使用这些生成的图像和为源图像提供的手动注释进行培训分割网络。

translated by 谷歌翻译

Adapting Procedural Content Generation to Player Personas Through Evolution

Pedro M. Fernandes , Jonathan Jørgensen , Niels N. T. G. Poldervaart

分类：人工智能

2021-12-07

自动适应玩家的游戏内容打开新的游戏开发门。在本文中，我们提出了一种使用人物代理和经验指标的架构，这使得能够在进行针对特定玩家人物的程序生成的水平。使用我们的游戏“Grave Rave”，我们证明了这种方法成功地适应了三个不同的三种不同体验指标的基于法则的角色代理。此外，该适应性被证明是特定的，这意味着水平是人的意识，而不仅仅是关于所选度量的一般优化。

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Neural Fields for Robotic Object Manipulation from a Single Image

Valts Blukis , Taeyeop Lee , Jonathan Tremblay , Bowen Wen , In So Kweon , Kuk-Jin Yoon , Dieter Fox , Stan Birchfield

分类：机器人 | 人工智能 | 计算机视觉 | 机器学习

2022-10-21

We present a unified and compact representation for object rendering, 3D reconstruction, and grasp pose prediction that can be inferred from a single image within a few seconds. We achieve this by leveraging recent advances in the Neural Radiance Field (NeRF) literature that learn category-level priors and fine-tune on novel objects with minimal data and time. Our insight is that we can learn a compact shape representation and extract meaningful additional information from it, such as grasping poses. We believe this to be the first work to retrieve grasping poses directly from a NeRF-based representation using a single viewpoint (RGB-only), rather than going through a secondary network and/or representation. When compared to prior art, our method is two to three orders of magnitude smaller while achieving comparable performance at view reconstruction and grasping. Accompanying our method, we also propose a new dataset of rendered shoes for training a sim-2-real NeRF method with grasping poses for different widths of grippers.

translated by 谷歌翻译